将原始观测值转化为结构化的 R 对象 是进行概率分析所必需的技术流程。在建模分布之前,我们必须掌握 数据摄入 以及列表、矩阵和数据框之间的结构细微差别。
1. 结构化摄入
通过 scan() 通常需要一个 虚拟列表结构 来定义变量类型(例如, list(id="", x=0))。这可以确保来自如 input.dat 等文件的外部数据被解析为可管理的组件,而不是扁平向量。
2. 维度组织
虽然 matrix 用于同质数值集合(使用 byrow=TRUE),而 data.frame() 则是统计建模的决定性桥梁,允许异构数据类型共存。
3. 变量可访问性
为了进行推断而访问数据,涉及通过索引方式 inp[[1]] 或命名列如 inp$id。像 attach() 这样的函数允许在 整个对象 (如 eruptions)中的变量可以直接访问,而无需重复索引。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What is the role of the second argument in
scan("file", list(id="", x=0))?It defines a dummy list structure to set variable types.
It limits the number of rows read from the file.
It converts the file into a numeric matrix automatically.
It sets the column names for a data frame.
✅ Correct!
The list serves as a template, telling R that the first column is a string and the second is numeric.❌ Incorrect
The list specifies the 'mode' or type of data to be read for each field.QUESTION 2
Which parameter ensures data is read row-wise during matrix construction?
ncol = TRUEbyrow = TRUEarrange = "rows"dim = c(n, m)✅ Correct!
By default, R fills matrices by column. byrow=TRUE changes this to preserve observational integrity.❌ Incorrect
R's default matrix filling is column-major; byrow=TRUE is required for row-major fills.QUESTION 3
If
inp is a list containing experimental data, how do you access the first component?inp(1)inp[[1]]inp->firstinp.1✅ Correct!
Double square brackets [[ ]] are the standard R syntax for extracting a single element from a list.❌ Incorrect
In R, [[ ]] is used for list indexing, whereas ( ) is for function calls.QUESTION 4
What does the
attach() function do in this context?It merges two data frames into one.
It makes data frame variables accessible as local variables.
It downloads a package from the repository.
It saves the whole object to a .dat file.
✅ Correct!
attach() puts the data frame columns into the search path, allowing direct access by name.❌ Incorrect
attach() is for visibility/masking, not for merging or downloading.QUESTION 5
Which function allows for interactive spreadsheet-like editing of a data object?
update()edit()library()scan()✅ Correct!
edit() and fix() open a GUI editor for manual data correction.❌ Incorrect
edit() invokes the visual data editor, while scan() is for file ingestion.Case Study: Enzyme Kinetics Data Structuring
Transforming Raw Experimental Output
A researcher has raw enzyme data in 'input.dat'. They need to load this into R, inspect it for errors, and prepare it for a T-test comparing treated vs. untreated rates using the 'Puromycin' dataset logic.
Q
1. Provide the code to load 'Puromycin' and open it for manual correction.
Solution:
data(Puromycin, package="datasets"); xnew <- edit(Puromycin)Q
2. How would the researcher construct a 5-column matrix from a raw numeric file 'light.dat'?
Solution:
X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)Q
3. After loading a data frame, how can the researcher access columns like 'conc' directly?
Solution:
By using
By using
attach(data_frame_name). This allows the researcher to use conc in functions without the $ operator.